Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition
نویسندگان
چکیده
Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused modeling spatial configuration parts representing pose, human-objects interactions and variations local appearance. The results show that this a brittle approach since it relies accurate parts/objects detection. In work, we argue there exist discriminative semantic regions, whose “informativeness” can be evaluated by the attention mechanism inferring gestures/actions. To end, propose novel end-to-end regional network (RAN) , fully convolutional neural (CNN) to combine multiple contextual regions through mechanism, focusing most relevant given task. Our consist one or more consecutive cells adapted from strategies used computing HOG (Histogram Oriented Gradient) descriptor. model extensively ten datasets belonging 3 different scenarios: 1) head pose recognition, 2) drivers state 3) action facial expression recognition. proposed outperforms state-of-the-art considerable margin metrics.
منابع مشابه
Fine-grained pose prediction, normalization, and recognition
Pose variation and subtle differences in appearance are key challenges to finegrained classification. While deep networks have markedly improved general recognition, many approaches to fine-grained recognition rely on anchoring networks to parts for better accuracy. Identifying parts to find correspondence discounts pose variation so that features can be tuned to appearance. To this end previou...
متن کاملFine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is ...
متن کاملFine-Grained Activity Recognition with Holistic and Pose Based Features
Holistic methods based on dense trajectories [29, 30] are currently the de facto standard for recognition of human activities in video. Whether holistic representations will sustain or will be superseded by higher level video encoding in terms of body pose and motion is the subject of an ongoing debate [12]. In this paper we aim to clarify the underlying factors responsible for good performance...
متن کاملAttention for Fine-Grained Categorization
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, beginning with fine-grained categorization on the Stanford Dogs data set. In this work we use an RNN of the same structure but substitute a more powerful visual network and perform large-scale pre-training of the visual network outside of the...
متن کاملHead Gesture Recognition Based on Bayesian Network
Head gestures such as nodding and shaking are often used as one of human body languages for communication with each other, and their recognition plays an important role in the development of HumanComputer Interaction (HCI). As head gesture is the continuous motion on the sequential time series, the key problems of recognition are to track multi-view head and understand the head pose transformat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Affective Computing
سال: 2023
ISSN: ['1949-3045', '2371-9850']
DOI: https://doi.org/10.1109/taffc.2020.3031841